Overview

Dataset statistics

Number of variables21
Number of observations18062
Missing cells68095
Missing cells (%)18.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.9 MiB
Average record size in memory168.0 B

Variable types

Text10
DateTime1
Categorical7
Numeric3

Alerts

CCL_cm is highly overall correlated with CCW_cm and 1 other fieldsHigh correlation
CCW_cm is highly overall correlated with CCL_cm and 2 other fieldsHigh correlation
Weight_Kg is highly overall correlated with CCL_cm and 2 other fieldsHigh correlation
CaptureSite is highly overall correlated with ForagingGround and 1 other fieldsHigh correlation
ForagingGround is highly overall correlated with CaptureSite and 1 other fieldsHigh correlation
LandingSite is highly overall correlated with CaptureSite and 1 other fieldsHigh correlation
Sex is highly overall correlated with CCW_cm and 1 other fieldsHigh correlation
Researcher is highly imbalanced (63.0%)Imbalance
CaptureMethod is highly imbalanced (81.3%)Imbalance
Species is highly imbalanced (66.5%)Imbalance
Sex is highly imbalanced (95.1%)Imbalance
Tag_2 has 13151 (72.8%) missing valuesMissing
Lost_Tags has 17137 (94.9%) missing valuesMissing
T_Number has 18024 (99.8%) missing valuesMissing
Weight_Kg has 5409 (29.9%) missing valuesMissing
Sex has 4330 (24.0%) missing valuesMissing
Status has 3633 (20.1%) missing valuesMissing
Date_TimeRelease has 6108 (33.8%) missing valuesMissing
Rescue_ID has unique valuesUnique

Reproduction

Analysis started2023-08-30 12:53:12.446368
Analysis finished2023-08-30 12:53:16.645753
Duration4.2 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

Rescue_ID
Text

UNIQUE 

Distinct18062
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
2023-08-30T14:53:16.996632image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length12
Median length12
Mean length12
Min length12

Characters and Unicode

Total characters216744
Distinct characters13
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18062 ?
Unique (%)100.0%

Sample

1st row2000_RE_0060
2nd row2001_RE_0187
3rd row2001_RE_0197
4th row2002_RE_0031
5th row2002_RE_0118
ValueCountFrequency (%)
2000_re_0060 1
 
< 0.1%
2003_re_0363 1
 
< 0.1%
2002_re_0119 1
 
< 0.1%
2002_re_0214 1
 
< 0.1%
2002_re_0215 1
 
< 0.1%
2011_re_0476 1
 
< 0.1%
2002_re_0218 1
 
< 0.1%
2003_re_0322 1
 
< 0.1%
2003_re_0411 1
 
< 0.1%
2002_re_0031 1
 
< 0.1%
Other values (18052) 18052
99.9%
2023-08-30T14:53:17.421270image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 44210
20.4%
_ 36124
16.7%
2 26185
12.1%
1 25593
11.8%
R 18062
8.3%
E 18062
8.3%
3 7862
 
3.6%
4 7617
 
3.5%
5 7303
 
3.4%
8 6969
 
3.2%
Other values (3) 18757
8.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 144496
66.7%
Connector Punctuation 36124
 
16.7%
Uppercase Letter 36124
 
16.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 44210
30.6%
2 26185
18.1%
1 25593
17.7%
3 7862
 
5.4%
4 7617
 
5.3%
5 7303
 
5.1%
8 6969
 
4.8%
7 6800
 
4.7%
6 6572
 
4.5%
9 5385
 
3.7%
Uppercase Letter
ValueCountFrequency (%)
R 18062
50.0%
E 18062
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 36124
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 180620
83.3%
Latin 36124
 
16.7%

Most frequent character per script

Common
ValueCountFrequency (%)
0 44210
24.5%
_ 36124
20.0%
2 26185
14.5%
1 25593
14.2%
3 7862
 
4.4%
4 7617
 
4.2%
5 7303
 
4.0%
8 6969
 
3.9%
7 6800
 
3.8%
6 6572
 
3.6%
Latin
ValueCountFrequency (%)
R 18062
50.0%
E 18062
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 216744
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 44210
20.4%
_ 36124
16.7%
2 26185
12.1%
1 25593
11.8%
R 18062
8.3%
E 18062
8.3%
3 7862
 
3.6%
4 7617
 
3.5%
5 7303
 
3.4%
8 6969
 
3.2%
Other values (3) 18757
8.7%
Distinct5237
Distinct (%)29.0%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
Minimum1998-04-17 00:00:00
Maximum2018-12-31 00:00:00
2023-08-30T14:53:17.529458image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:17.585020image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

Researcher
Categorical

IMBALANCE 

Distinct35
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
Researcher_20
9778 
Researcher_30
5346 
Researcher_7
1011 
Researcher_25
 
515
Researcher_10
 
347
Other values (30)
1065 

Length

Max length13
Median length13
Mean length12.930683
Min length12

Characters and Unicode

Total characters233554
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique9 ?
Unique (%)< 0.1%

Sample

1st rowResearcher_25
2nd rowResearcher_6
3rd rowResearcher_6
4th rowResearcher_32
5th rowResearcher_25

Common Values

ValueCountFrequency (%)
Researcher_20 9778
54.1%
Researcher_30 5346
29.6%
Researcher_7 1011
 
5.6%
Researcher_25 515
 
2.9%
Researcher_10 347
 
1.9%
Researcher_32 339
 
1.9%
Researcher_13 337
 
1.9%
Researcher_6 131
 
0.7%
Researcher_37 41
 
0.2%
Researcher_4 36
 
0.2%
Other values (25) 181
 
1.0%

Length

2023-08-30T14:53:17.651024image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
researcher_20 9778
54.1%
researcher_30 5346
29.6%
researcher_7 1011
 
5.6%
researcher_25 515
 
2.9%
researcher_10 347
 
1.9%
researcher_32 339
 
1.9%
researcher_13 337
 
1.9%
researcher_6 131
 
0.7%
researcher_37 41
 
0.2%
researcher_4 36
 
0.2%
Other values (25) 181
 
1.0%

Most occurring characters

ValueCountFrequency (%)
e 54186
23.2%
r 36124
15.5%
R 18062
 
7.7%
s 18062
 
7.7%
a 18062
 
7.7%
c 18062
 
7.7%
h 18062
 
7.7%
_ 18062
 
7.7%
0 15475
 
6.6%
2 10672
 
4.6%
Other values (8) 8725
 
3.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 162558
69.6%
Decimal Number 34872
 
14.9%
Uppercase Letter 18062
 
7.7%
Connector Punctuation 18062
 
7.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 15475
44.4%
2 10672
30.6%
3 6112
 
17.5%
7 1089
 
3.1%
1 800
 
2.3%
5 520
 
1.5%
6 135
 
0.4%
4 40
 
0.1%
9 26
 
0.1%
8 3
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
e 54186
33.3%
r 36124
22.2%
s 18062
 
11.1%
a 18062
 
11.1%
c 18062
 
11.1%
h 18062
 
11.1%
Uppercase Letter
ValueCountFrequency (%)
R 18062
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 18062
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 180620
77.3%
Common 52934
 
22.7%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 18062
34.1%
0 15475
29.2%
2 10672
20.2%
3 6112
 
11.5%
7 1089
 
2.1%
1 800
 
1.5%
5 520
 
1.0%
6 135
 
0.3%
4 40
 
0.1%
9 26
 
< 0.1%
Latin
ValueCountFrequency (%)
e 54186
30.0%
r 36124
20.0%
R 18062
 
10.0%
s 18062
 
10.0%
a 18062
 
10.0%
c 18062
 
10.0%
h 18062
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 233554
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 54186
23.2%
r 36124
15.5%
R 18062
 
7.7%
s 18062
 
7.7%
a 18062
 
7.7%
c 18062
 
7.7%
h 18062
 
7.7%
_ 18062
 
7.7%
0 15475
 
6.6%
2 10672
 
4.6%
Other values (8) 8725
 
3.7%

CaptureSite
Categorical

HIGH CORRELATION 

Distinct29
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
CaptureSite_25
2574 
CaptureSite_9
2169 
CaptureSite_15
1986 
CaptureSite_23
1906 
CaptureSite_16
957 
Other values (24)
8470 

Length

Max length14
Median length14
Mean length13.717971
Min length13

Characters and Unicode

Total characters247774
Distinct characters20
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCaptureSite_0
2nd rowCaptureSite_0
3rd rowCaptureSite_0
4th rowCaptureSite_0
5th rowCaptureSite_0

Common Values

ValueCountFrequency (%)
CaptureSite_25 2574
14.3%
CaptureSite_9 2169
12.0%
CaptureSite_15 1986
11.0%
CaptureSite_23 1906
 
10.6%
CaptureSite_16 957
 
5.3%
CaptureSite_13 821
 
4.5%
CaptureSite_1 793
 
4.4%
CaptureSite_21 714
 
4.0%
CaptureSite_27 686
 
3.8%
CaptureSite_5 646
 
3.6%
Other values (19) 4810
26.6%

Length

2023-08-30T14:53:17.721462image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
capturesite_25 2574
14.3%
capturesite_9 2169
12.0%
capturesite_15 1986
11.0%
capturesite_23 1906
 
10.6%
capturesite_16 957
 
5.3%
capturesite_13 821
 
4.5%
capturesite_1 793
 
4.4%
capturesite_21 714
 
4.0%
capturesite_27 686
 
3.8%
capturesite_5 646
 
3.6%
Other values (19) 4810
26.6%

Most occurring characters

ValueCountFrequency (%)
t 36124
14.6%
e 36124
14.6%
C 18062
7.3%
a 18062
7.3%
p 18062
7.3%
u 18062
7.3%
r 18062
7.3%
S 18062
7.3%
i 18062
7.3%
_ 18062
7.3%
Other values (10) 31030
12.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 162558
65.6%
Uppercase Letter 36124
 
14.6%
Decimal Number 31030
 
12.5%
Connector Punctuation 18062
 
7.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 8187
26.4%
1 7838
25.3%
5 5206
16.8%
3 2848
 
9.2%
9 2584
 
8.3%
7 1306
 
4.2%
6 1182
 
3.8%
4 886
 
2.9%
0 702
 
2.3%
8 291
 
0.9%
Lowercase Letter
ValueCountFrequency (%)
t 36124
22.2%
e 36124
22.2%
a 18062
11.1%
p 18062
11.1%
u 18062
11.1%
r 18062
11.1%
i 18062
11.1%
Uppercase Letter
ValueCountFrequency (%)
C 18062
50.0%
S 18062
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 18062
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 198682
80.2%
Common 49092
 
19.8%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 18062
36.8%
2 8187
16.7%
1 7838
16.0%
5 5206
 
10.6%
3 2848
 
5.8%
9 2584
 
5.3%
7 1306
 
2.7%
6 1182
 
2.4%
4 886
 
1.8%
0 702
 
1.4%
Latin
ValueCountFrequency (%)
t 36124
18.2%
e 36124
18.2%
C 18062
9.1%
a 18062
9.1%
p 18062
9.1%
u 18062
9.1%
r 18062
9.1%
S 18062
9.1%
i 18062
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 247774
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 36124
14.6%
e 36124
14.6%
C 18062
7.3%
a 18062
7.3%
p 18062
7.3%
u 18062
7.3%
r 18062
7.3%
S 18062
7.3%
i 18062
7.3%
_ 18062
7.3%
Other values (10) 31030
12.5%

ForagingGround
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
Creek
11408 
Ocean
6651 
creek
 
3

Length

Max length5
Median length5
Mean length5
Min length5

Characters and Unicode

Total characters90310
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOcean
2nd rowOcean
3rd rowOcean
4th rowOcean
5th rowOcean

Common Values

ValueCountFrequency (%)
Creek 11408
63.2%
Ocean 6651
36.8%
creek 3
 
< 0.1%

Length

2023-08-30T14:53:17.780190image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-30T14:53:17.848168image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
creek 11411
63.2%
ocean 6651
36.8%

Most occurring characters

ValueCountFrequency (%)
e 29473
32.6%
r 11411
 
12.6%
k 11411
 
12.6%
C 11408
 
12.6%
c 6654
 
7.4%
O 6651
 
7.4%
a 6651
 
7.4%
n 6651
 
7.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 72251
80.0%
Uppercase Letter 18059
 
20.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 29473
40.8%
r 11411
 
15.8%
k 11411
 
15.8%
c 6654
 
9.2%
a 6651
 
9.2%
n 6651
 
9.2%
Uppercase Letter
ValueCountFrequency (%)
C 11408
63.2%
O 6651
36.8%

Most occurring scripts

ValueCountFrequency (%)
Latin 90310
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 29473
32.6%
r 11411
 
12.6%
k 11411
 
12.6%
C 11408
 
12.6%
c 6654
 
7.4%
O 6651
 
7.4%
a 6651
 
7.4%
n 6651
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90310
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 29473
32.6%
r 11411
 
12.6%
k 11411
 
12.6%
C 11408
 
12.6%
c 6654
 
7.4%
O 6651
 
7.4%
a 6651
 
7.4%
n 6651
 
7.4%

CaptureMethod
Categorical

IMBALANCE 

Distinct15
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
Net
15934 
Longline
 
1464
Jarife
 
198
Uzio
 
119
Beached
 
100
Other values (10)
 
247

Length

Max length17
Median length3
Mean length3.5816632
Min length3

Characters and Unicode

Total characters64692
Distinct characters33
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowNet
2nd rowNet
3rd rowNet
4th rowNet
5th rowBeached

Common Values

ValueCountFrequency (%)
Net 15934
88.2%
Longline 1464
 
8.1%
Jarife 198
 
1.1%
Uzio 119
 
0.7%
Beached 100
 
0.6%
Not_Recorded 86
 
0.5%
Collected Floater 73
 
0.4%
net 28
 
0.2%
By Hand 26
 
0.1%
stranded 22
 
0.1%
Other values (5) 12
 
0.1%

Length

2023-08-30T14:53:17.901644image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
net 15962
87.9%
longline 1466
 
8.1%
jarife 199
 
1.1%
uzio 119
 
0.7%
beached 100
 
0.6%
not_recorded 86
 
0.5%
collected 73
 
0.4%
floater 73
 
0.4%
by 27
 
0.1%
hand 27
 
0.1%
Other values (4) 36
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e 18242
28.2%
t 16216
25.1%
N 16020
24.8%
n 3011
 
4.7%
o 1903
 
2.9%
i 1790
 
2.8%
l 1687
 
2.6%
g 1468
 
2.3%
L 1464
 
2.3%
a 429
 
0.7%
Other values (23) 2462
 
3.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 46301
71.6%
Uppercase Letter 18199
 
28.1%
Space Separator 106
 
0.2%
Connector Punctuation 86
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 18242
39.4%
t 16216
35.0%
n 3011
 
6.5%
o 1903
 
4.1%
i 1790
 
3.9%
l 1687
 
3.6%
g 1468
 
3.2%
a 429
 
0.9%
d 416
 
0.9%
r 388
 
0.8%
Other values (10) 751
 
1.6%
Uppercase Letter
ValueCountFrequency (%)
N 16020
88.0%
L 1464
 
8.0%
J 198
 
1.1%
B 126
 
0.7%
U 119
 
0.7%
R 86
 
0.5%
F 79
 
0.4%
C 73
 
0.4%
H 26
 
0.1%
T 6
 
< 0.1%
Space Separator
ValueCountFrequency (%)
106
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 86
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 64500
99.7%
Common 192
 
0.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 18242
28.3%
t 16216
25.1%
N 16020
24.8%
n 3011
 
4.7%
o 1903
 
3.0%
i 1790
 
2.8%
l 1687
 
2.6%
g 1468
 
2.3%
L 1464
 
2.3%
a 429
 
0.7%
Other values (21) 2270
 
3.5%
Common
ValueCountFrequency (%)
106
55.2%
_ 86
44.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 64692
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 18242
28.2%
t 16216
25.1%
N 16020
24.8%
n 3011
 
4.7%
o 1903
 
2.9%
i 1790
 
2.8%
l 1687
 
2.6%
g 1468
 
2.3%
L 1464
 
2.3%
a 429
 
0.7%
Other values (23) 2462
 
3.8%

Fisher
Text

Distinct2085
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
2023-08-30T14:53:18.106662image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length11
Median length11
Mean length10.520374
Min length8

Characters and Unicode

Total characters190019
Distinct characters17
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1078 ?
Unique (%)6.0%

Sample

1st rowFisher_1072
2nd rowFisher_520
3rd rowFisher_1669
4th rowFisher_1798
5th rowFisher_1918
ValueCountFrequency (%)
fisher_1478 1499
 
8.3%
fisher_92 570
 
3.2%
fisher_1550 448
 
2.5%
fisher_1551 435
 
2.4%
fisher_1472 383
 
2.1%
fisher_1217 300
 
1.7%
fisher_818 244
 
1.4%
fisher_873 236
 
1.3%
fisher_863 218
 
1.2%
fisher_1651 217
 
1.2%
Other values (2075) 13512
74.8%
2023-08-30T14:53:18.396858image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
F 18062
9.5%
s 18062
9.5%
h 18062
9.5%
e 18062
9.5%
r 18062
9.5%
_ 18062
9.5%
i 18062
9.5%
1 14592
7.7%
8 7182
 
3.8%
5 7020
 
3.7%
Other values (7) 34791
18.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90310
47.5%
Decimal Number 63585
33.5%
Uppercase Letter 18062
 
9.5%
Connector Punctuation 18062
 
9.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 14592
22.9%
8 7182
11.3%
5 7020
11.0%
7 5804
 
9.1%
4 5767
 
9.1%
2 5736
 
9.0%
9 4681
 
7.4%
0 4325
 
6.8%
3 4261
 
6.7%
6 4217
 
6.6%
Lowercase Letter
ValueCountFrequency (%)
s 18062
20.0%
h 18062
20.0%
e 18062
20.0%
r 18062
20.0%
i 18062
20.0%
Uppercase Letter
ValueCountFrequency (%)
F 18062
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 18062
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 108372
57.0%
Common 81647
43.0%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 18062
22.1%
1 14592
17.9%
8 7182
 
8.8%
5 7020
 
8.6%
7 5804
 
7.1%
4 5767
 
7.1%
2 5736
 
7.0%
9 4681
 
5.7%
0 4325
 
5.3%
3 4261
 
5.2%
Latin
ValueCountFrequency (%)
F 18062
16.7%
s 18062
16.7%
h 18062
16.7%
e 18062
16.7%
r 18062
16.7%
i 18062
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 190019
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
F 18062
9.5%
s 18062
9.5%
h 18062
9.5%
e 18062
9.5%
r 18062
9.5%
_ 18062
9.5%
i 18062
9.5%
1 14592
7.7%
8 7182
 
3.8%
5 7020
 
3.7%
Other values (7) 34791
18.3%

LandingSite
Categorical

HIGH CORRELATION 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
LandingSite_CaptureSiteCategory_0
8469 
LandingSite_CaptureSiteCategory_2
4258 
LandingSite_CaptureSiteCategory_4
2946 
LandingSite_CaptureSiteCategory_1
2312 
LandingSite_CaptureSiteCategory_3
 
77

Length

Max length33
Median length33
Mean length33
Min length33

Characters and Unicode

Total characters596046
Distinct characters21
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLandingSite_CaptureSiteCategory_2
2nd rowLandingSite_CaptureSiteCategory_2
3rd rowLandingSite_CaptureSiteCategory_2
4th rowLandingSite_CaptureSiteCategory_2
5th rowLandingSite_CaptureSiteCategory_2

Common Values

ValueCountFrequency (%)
LandingSite_CaptureSiteCategory_0 8469
46.9%
LandingSite_CaptureSiteCategory_2 4258
23.6%
LandingSite_CaptureSiteCategory_4 2946
 
16.3%
LandingSite_CaptureSiteCategory_1 2312
 
12.8%
LandingSite_CaptureSiteCategory_3 77
 
0.4%

Length

2023-08-30T14:53:18.510157image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-30T14:53:18.700787image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
landingsite_capturesitecategory_0 8469
46.9%
landingsite_capturesitecategory_2 4258
23.6%
landingsite_capturesitecategory_4 2946
 
16.3%
landingsite_capturesitecategory_1 2312
 
12.8%
landingsite_capturesitecategory_3 77
 
0.4%

Most occurring characters

ValueCountFrequency (%)
t 72248
12.1%
e 72248
12.1%
i 54186
9.1%
a 54186
9.1%
C 36124
 
6.1%
r 36124
 
6.1%
n 36124
 
6.1%
g 36124
 
6.1%
S 36124
 
6.1%
_ 36124
 
6.1%
Other values (11) 126434
21.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 451550
75.8%
Uppercase Letter 90310
 
15.2%
Connector Punctuation 36124
 
6.1%
Decimal Number 18062
 
3.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 72248
16.0%
e 72248
16.0%
i 54186
12.0%
a 54186
12.0%
r 36124
8.0%
n 36124
8.0%
g 36124
8.0%
y 18062
 
4.0%
o 18062
 
4.0%
u 18062
 
4.0%
Other values (2) 36124
8.0%
Decimal Number
ValueCountFrequency (%)
0 8469
46.9%
2 4258
23.6%
4 2946
 
16.3%
1 2312
 
12.8%
3 77
 
0.4%
Uppercase Letter
ValueCountFrequency (%)
C 36124
40.0%
S 36124
40.0%
L 18062
20.0%
Connector Punctuation
ValueCountFrequency (%)
_ 36124
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 541860
90.9%
Common 54186
 
9.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 72248
13.3%
e 72248
13.3%
i 54186
10.0%
a 54186
10.0%
C 36124
 
6.7%
r 36124
 
6.7%
n 36124
 
6.7%
g 36124
 
6.7%
S 36124
 
6.7%
y 18062
 
3.3%
Other values (5) 90310
16.7%
Common
ValueCountFrequency (%)
_ 36124
66.7%
0 8469
 
15.6%
2 4258
 
7.9%
4 2946
 
5.4%
1 2312
 
4.3%
3 77
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 596046
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 72248
12.1%
e 72248
12.1%
i 54186
9.1%
a 54186
9.1%
C 36124
 
6.1%
r 36124
 
6.1%
n 36124
 
6.1%
g 36124
 
6.1%
S 36124
 
6.1%
_ 36124
 
6.1%
Other values (11) 126434
21.2%

Species
Categorical

IMBALANCE 

Distinct8
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size141.2 KiB
Species_5
11097 
Species_6
6882 
Species_4
 
50
Species_7
 
21
Species_0
 
7
Other values (3)
 
5

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters162558
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowSpecies_6
2nd rowSpecies_6
3rd rowSpecies_5
4th rowSpecies_6
5th rowSpecies_5

Common Values

ValueCountFrequency (%)
Species_5 11097
61.4%
Species_6 6882
38.1%
Species_4 50
 
0.3%
Species_7 21
 
0.1%
Species_0 7
 
< 0.1%
Species_1 2
 
< 0.1%
Species_2 2
 
< 0.1%
Species_3 1
 
< 0.1%

Length

2023-08-30T14:53:18.801751image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-30T14:53:19.000940image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
species_5 11097
61.4%
species_6 6882
38.1%
species_4 50
 
0.3%
species_7 21
 
0.1%
species_0 7
 
< 0.1%
species_1 2
 
< 0.1%
species_2 2
 
< 0.1%
species_3 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 36124
22.2%
S 18062
11.1%
p 18062
11.1%
c 18062
11.1%
i 18062
11.1%
s 18062
11.1%
_ 18062
11.1%
5 11097
 
6.8%
6 6882
 
4.2%
4 50
 
< 0.1%
Other values (5) 33
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 108372
66.7%
Uppercase Letter 18062
 
11.1%
Connector Punctuation 18062
 
11.1%
Decimal Number 18062
 
11.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5 11097
61.4%
6 6882
38.1%
4 50
 
0.3%
7 21
 
0.1%
0 7
 
< 0.1%
1 2
 
< 0.1%
2 2
 
< 0.1%
3 1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
e 36124
33.3%
p 18062
16.7%
c 18062
16.7%
i 18062
16.7%
s 18062
16.7%
Uppercase Letter
ValueCountFrequency (%)
S 18062
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 18062
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 126434
77.8%
Common 36124
 
22.2%

Most frequent character per script

Common
ValueCountFrequency (%)
_ 18062
50.0%
5 11097
30.7%
6 6882
 
19.1%
4 50
 
0.1%
7 21
 
0.1%
0 7
 
< 0.1%
1 2
 
< 0.1%
2 2
 
< 0.1%
3 1
 
< 0.1%
Latin
ValueCountFrequency (%)
e 36124
28.6%
S 18062
14.3%
p 18062
14.3%
c 18062
14.3%
i 18062
14.3%
s 18062
14.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 162558
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 36124
22.2%
S 18062
11.1%
p 18062
11.1%
c 18062
11.1%
i 18062
11.1%
s 18062
11.1%
_ 18062
11.1%
5 11097
 
6.8%
6 6882
 
4.2%
4 50
 
< 0.1%
Other values (5) 33
 
< 0.1%

Tag_1
Text

Distinct8235
Distinct (%)45.9%
Missing125
Missing (%)0.7%
Memory size141.2 KiB
2023-08-30T14:53:19.353424image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length18
Median length6
Mean length6.164353
Min length4

Characters and Unicode

Total characters110570
Distinct characters45
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5513 ?
Unique (%)30.7%

Sample

1st rowCC00147
2nd rowW442
3rd rowKE0376
4th rowCC00302
5th rowNotTagged_0113
ValueCountFrequency (%)
kes1306 117
 
0.7%
4858 90
 
0.5%
ke8098 81
 
0.5%
ke6133 81
 
0.5%
ke7799 75
 
0.4%
ke7358 74
 
0.4%
kes0447 72
 
0.4%
ke5559 69
 
0.4%
ke6245 68
 
0.4%
ke5630 67
 
0.4%
Other values (8126) 17152
95.6%
2023-08-30T14:53:19.812399image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
K 14821
13.4%
E 14723
13.3%
0 9088
8.2%
1 8457
7.6%
6 7882
 
7.1%
7 7604
 
6.9%
3 7547
 
6.8%
8 7370
 
6.7%
5 6623
 
6.0%
4 5821
 
5.3%
Other values (35) 20634
18.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 71334
64.5%
Uppercase Letter 34997
31.7%
Lowercase Letter 3789
 
3.4%
Connector Punctuation 441
 
0.4%
Space Separator 9
 
< 0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
K 14821
42.3%
E 14723
42.1%
S 4030
 
11.5%
N 437
 
1.2%
T 425
 
1.2%
B 146
 
0.4%
C 109
 
0.3%
A 106
 
0.3%
L 74
 
0.2%
H 62
 
0.2%
Other values (8) 64
 
0.2%
Lowercase Letter
ValueCountFrequency (%)
g 869
22.9%
e 631
16.7%
o 497
13.1%
t 459
12.1%
a 450
11.9%
d 440
11.6%
k 144
 
3.8%
n 119
 
3.1%
s 110
 
2.9%
i 24
 
0.6%
Other values (5) 46
 
1.2%
Decimal Number
ValueCountFrequency (%)
0 9088
12.7%
1 8457
11.9%
6 7882
11.0%
7 7604
10.7%
3 7547
10.6%
8 7370
10.3%
5 6623
9.3%
4 5821
8.2%
2 5786
8.1%
9 5156
7.2%
Connector Punctuation
ValueCountFrequency (%)
_ 441
100.0%
Space Separator
ValueCountFrequency (%)
9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 71784
64.9%
Latin 38786
35.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
K 14821
38.2%
E 14723
38.0%
S 4030
 
10.4%
g 869
 
2.2%
e 631
 
1.6%
o 497
 
1.3%
t 459
 
1.2%
a 450
 
1.2%
d 440
 
1.1%
N 437
 
1.1%
Other values (23) 1429
 
3.7%
Common
ValueCountFrequency (%)
0 9088
12.7%
1 8457
11.8%
6 7882
11.0%
7 7604
10.6%
3 7547
10.5%
8 7370
10.3%
5 6623
9.2%
4 5821
8.1%
2 5786
8.1%
9 5156
7.2%
Other values (2) 450
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 110570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K 14821
13.4%
E 14723
13.3%
0 9088
8.2%
1 8457
7.6%
6 7882
 
7.1%
7 7604
 
6.9%
3 7547
 
6.8%
8 7370
 
6.7%
5 6623
 
6.0%
4 5821
 
5.3%
Other values (35) 20634
18.7%

Tag_2
Text

MISSING 

Distinct246
Distinct (%)5.0%
Missing13151
Missing (%)72.8%
Memory size141.2 KiB
2023-08-30T14:53:20.014057image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length12
Median length4
Mean length4.1407045
Min length4

Characters and Unicode

Total characters20335
Distinct characters34
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique218 ?
Unique (%)4.4%

Sample

1st rowKE6344
2nd rowKEB7571
3rd rowNONE
4th rownone
5th rownone
ValueCountFrequency (%)
none 4626
94.2%
3841 5
 
0.1%
keb7600 4
 
0.1%
ke6247 4
 
0.1%
e4496 4
 
0.1%
keb7536 4
 
0.1%
w446 3
 
0.1%
ke6444 3
 
0.1%
keb8295 3
 
0.1%
ke6306 3
 
0.1%
Other values (235) 252
 
5.1%
2023-08-30T14:53:20.297985image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 8389
41.3%
e 4195
20.6%
o 4194
20.6%
N 864
 
4.2%
E 674
 
3.3%
O 432
 
2.1%
K 239
 
1.2%
0 194
 
1.0%
1 142
 
0.7%
8 130
 
0.6%
Other values (24) 882
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 16790
82.6%
Uppercase Letter 2411
 
11.9%
Decimal Number 1133
 
5.6%
Connector Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 8389
50.0%
e 4195
25.0%
o 4194
25.0%
i 2
 
< 0.1%
s 2
 
< 0.1%
a 2
 
< 0.1%
t 1
 
< 0.1%
k 1
 
< 0.1%
g 1
 
< 0.1%
d 1
 
< 0.1%
Other values (2) 2
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
N 864
35.8%
E 674
28.0%
O 432
17.9%
K 239
 
9.9%
B 112
 
4.6%
L 72
 
3.0%
S 8
 
0.3%
C 4
 
0.2%
W 3
 
0.1%
A 2
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 194
17.1%
1 142
12.5%
8 130
11.5%
7 125
11.0%
2 104
9.2%
4 97
8.6%
6 97
8.6%
5 96
8.5%
3 86
7.6%
9 62
 
5.5%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 19201
94.4%
Common 1134
 
5.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 8389
43.7%
e 4195
21.8%
o 4194
21.8%
N 864
 
4.5%
E 674
 
3.5%
O 432
 
2.2%
K 239
 
1.2%
B 112
 
0.6%
L 72
 
0.4%
S 8
 
< 0.1%
Other values (13) 22
 
0.1%
Common
ValueCountFrequency (%)
0 194
17.1%
1 142
12.5%
8 130
11.5%
7 125
11.0%
2 104
9.2%
4 97
8.6%
6 97
8.6%
5 96
8.5%
3 86
7.6%
9 62
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 20335
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 8389
41.3%
e 4195
20.6%
o 4194
20.6%
N 864
 
4.2%
E 674
 
3.3%
O 432
 
2.1%
K 239
 
1.2%
0 194
 
1.0%
1 142
 
0.7%
8 130
 
0.6%
Other values (24) 882
 
4.3%

Lost_Tags
Text

MISSING 

Distinct167
Distinct (%)18.1%
Missing17137
Missing (%)94.9%
Memory size141.2 KiB
2023-08-30T14:53:20.500180image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length17
Median length6
Mean length5.8897297
Min length4

Characters and Unicode

Total characters5448
Distinct characters18
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique57 ?
Unique (%)6.2%

Sample

1st row3641
2nd rowKE5963
3rd rowKE6375
4th rowKE5571
5th rowKE5869
ValueCountFrequency (%)
ke6491 116
 
12.5%
ke5571 75
 
8.1%
ke6375 56
 
6.0%
ke1457 26
 
2.8%
ke1436 24
 
2.6%
ke7164 23
 
2.5%
ke7716 22
 
2.4%
ke8122 20
 
2.1%
ke1805 19
 
2.0%
ke1638 17
 
1.8%
Other values (159) 533
57.3%
2023-08-30T14:53:20.749269image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
K 852
15.6%
E 825
15.1%
1 698
12.8%
5 472
8.7%
6 445
8.2%
4 432
7.9%
7 395
7.3%
3 329
 
6.0%
9 265
 
4.9%
8 229
 
4.2%
Other values (8) 506
9.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 3688
67.7%
Uppercase Letter 1751
32.1%
Space Separator 6
 
0.1%
Math Symbol 3
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 698
18.9%
5 472
12.8%
6 445
12.1%
4 432
11.7%
7 395
10.7%
3 329
8.9%
9 265
 
7.2%
8 229
 
6.2%
0 227
 
6.2%
2 196
 
5.3%
Uppercase Letter
ValueCountFrequency (%)
K 852
48.7%
E 825
47.1%
S 36
 
2.1%
A 27
 
1.5%
C 10
 
0.6%
Y 1
 
0.1%
Space Separator
ValueCountFrequency (%)
6
100.0%
Math Symbol
ValueCountFrequency (%)
+ 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 3697
67.9%
Latin 1751
32.1%

Most frequent character per script

Common
ValueCountFrequency (%)
1 698
18.9%
5 472
12.8%
6 445
12.0%
4 432
11.7%
7 395
10.7%
3 329
8.9%
9 265
 
7.2%
8 229
 
6.2%
0 227
 
6.1%
2 196
 
5.3%
Other values (2) 9
 
0.2%
Latin
ValueCountFrequency (%)
K 852
48.7%
E 825
47.1%
S 36
 
2.1%
A 27
 
1.5%
C 10
 
0.6%
Y 1
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 5448
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
K 852
15.6%
E 825
15.1%
1 698
12.8%
5 472
8.7%
6 445
8.2%
4 432
7.9%
7 395
7.3%
3 329
 
6.0%
9 265
 
4.9%
8 229
 
4.2%
Other values (8) 506
9.3%

T_Number
Text

MISSING 

Distinct38
Distinct (%)100.0%
Missing18024
Missing (%)99.8%
Memory size141.2 KiB
2023-08-30T14:53:20.885491image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length19
Median length4
Mean length4.6052632
Min length4

Characters and Unicode

Total characters175
Distinct characters24
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique38 ?
Unique (%)100.0%

Sample

1st rowT-123
2nd rowT501
3rd rowT002
4th rowT559
5th rowT558
ValueCountFrequency (%)
t126 1
 
2.4%
t476 1
 
2.4%
t-123 1
 
2.4%
t564 1
 
2.4%
t501 1
 
2.4%
t002 1
 
2.4%
t559 1
 
2.4%
t558 1
 
2.4%
t562 1
 
2.4%
t561 1
 
2.4%
Other values (32) 32
76.2%
2023-08-30T14:53:21.063667image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
T 38
21.7%
4 25
14.3%
0 14
 
8.0%
6 13
 
7.4%
5 12
 
6.9%
7 11
 
6.3%
1 9
 
5.1%
2 9
 
5.1%
9 9
 
5.1%
- 8
 
4.6%
Other values (14) 27
15.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 111
63.4%
Uppercase Letter 50
28.6%
Dash Punctuation 8
 
4.6%
Space Separator 4
 
2.3%
Lowercase Letter 2
 
1.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
T 38
76.0%
I 3
 
6.0%
F 1
 
2.0%
N 1
 
2.0%
D 1
 
2.0%
O 1
 
2.0%
U 1
 
2.0%
W 1
 
2.0%
H 1
 
2.0%
A 1
 
2.0%
Decimal Number
ValueCountFrequency (%)
4 25
22.5%
0 14
12.6%
6 13
11.7%
5 12
10.8%
7 11
9.9%
1 9
 
8.1%
2 9
 
8.1%
9 9
 
8.1%
8 5
 
4.5%
3 4
 
3.6%
Dash Punctuation
ValueCountFrequency (%)
- 8
100.0%
Space Separator
ValueCountFrequency (%)
4
100.0%
Lowercase Letter
ValueCountFrequency (%)
t 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 123
70.3%
Latin 52
29.7%

Most frequent character per script

Latin
ValueCountFrequency (%)
T 38
73.1%
I 3
 
5.8%
t 2
 
3.8%
F 1
 
1.9%
N 1
 
1.9%
D 1
 
1.9%
O 1
 
1.9%
U 1
 
1.9%
W 1
 
1.9%
H 1
 
1.9%
Other values (2) 2
 
3.8%
Common
ValueCountFrequency (%)
4 25
20.3%
0 14
11.4%
6 13
10.6%
5 12
9.8%
7 11
8.9%
1 9
 
7.3%
2 9
 
7.3%
9 9
 
7.3%
- 8
 
6.5%
8 5
 
4.1%
Other values (2) 8
 
6.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 175
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
T 38
21.7%
4 25
14.3%
0 14
 
8.0%
6 13
 
7.4%
5 12
 
6.9%
7 11
 
6.3%
1 9
 
5.1%
2 9
 
5.1%
9 9
 
5.1%
- 8
 
4.6%
Other values (14) 27
15.4%

CCL_cm
Real number (ℝ)

HIGH CORRELATION 

Distinct1338
Distinct (%)7.4%
Missing24
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean43.09039
Minimum2
Maximum122.75
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.2 KiB
2023-08-30T14:53:21.143636image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile30.5
Q136.33
median41.3
Q347
95-th percentile64.1405
Maximum122.75
Range120.75
Interquartile range (IQR)10.67

Descriptive statistics

Standard deviation11.004251
Coefficient of variation (CV)0.25537599
Kurtosis4.9892171
Mean43.09039
Median Absolute Deviation (MAD)5.3
Skewness1.3487914
Sum777264.46
Variance121.09354
MonotonicityNot monotonic
2023-08-30T14:53:21.203121image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40 172
 
1.0%
41 150
 
0.8%
42 142
 
0.8%
39 142
 
0.8%
41.5 142
 
0.8%
43 137
 
0.8%
42.5 134
 
0.7%
40.5 133
 
0.7%
39.5 133
 
0.7%
44 123
 
0.7%
Other values (1328) 16630
92.1%
ValueCountFrequency (%)
2 3
 
< 0.1%
3.5 1
 
< 0.1%
3.8 1
 
< 0.1%
4 2
 
< 0.1%
4.3 3
 
< 0.1%
4.4 5
 
< 0.1%
4.5 16
0.1%
4.6 3
 
< 0.1%
4.7 4
 
< 0.1%
4.8 5
 
< 0.1%
ValueCountFrequency (%)
122.75 1
< 0.1%
119.1 1
< 0.1%
119 1
< 0.1%
113.8 1
< 0.1%
112 2
< 0.1%
111.8 1
< 0.1%
111.5 1
< 0.1%
110 1
< 0.1%
108.8 1
< 0.1%
108.5 1
< 0.1%

CCW_cm
Real number (ℝ)

HIGH CORRELATION 

Distinct1262
Distinct (%)7.0%
Missing27
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean40.253904
Minimum2
Maximum106
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.2 KiB
2023-08-30T14:53:21.264082image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile28.2
Q134
median39.3
Q344.1
95-th percentile58.7
Maximum106
Range104
Interquartile range (IQR)10.1

Descriptive statistics

Standard deviation9.9330578
Coefficient of variation (CV)0.24676011
Kurtosis4.9275772
Mean40.253904
Median Absolute Deviation (MAD)5.1
Skewness1.2072039
Sum725979.16
Variance98.665638
MonotonicityNot monotonic
2023-08-30T14:53:21.466418image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
40 183
 
1.0%
38 149
 
0.8%
39 145
 
0.8%
42 142
 
0.8%
41 139
 
0.8%
38.5 136
 
0.8%
41.5 130
 
0.7%
37 126
 
0.7%
40.5 125
 
0.7%
39.5 123
 
0.7%
Other values (1252) 16637
92.1%
ValueCountFrequency (%)
2 3
 
< 0.1%
3.5 1
 
< 0.1%
3.6 3
 
< 0.1%
4 7
< 0.1%
4.1 3
 
< 0.1%
4.2 10
0.1%
4.3 13
0.1%
4.4 7
< 0.1%
4.45 1
 
< 0.1%
4.5 3
 
< 0.1%
ValueCountFrequency (%)
106 1
< 0.1%
103.5 1
< 0.1%
103 1
< 0.1%
102.6 1
< 0.1%
102.3 1
< 0.1%
101.8 1
< 0.1%
101.7 1
< 0.1%
100.6 1
< 0.1%
100 2
< 0.1%
99.8 1
< 0.1%

Weight_Kg
Real number (ℝ)

HIGH CORRELATION  MISSING 

Distinct1937
Distinct (%)15.3%
Missing5409
Missing (%)29.9%
Infinite0
Infinite (%)0.0%
Mean9.8507311
Minimum0.02
Maximum140
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size141.2 KiB
2023-08-30T14:53:21.522538image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0.02
5-th percentile3.04
Q15
median7.5
Q310.8
95-th percentile25.21
Maximum140
Range139.98
Interquartile range (IQR)5.8

Descriptive statistics

Standard deviation9.7373782
Coefficient of variation (CV)0.98849295
Kurtosis42.580834
Mean9.8507311
Median Absolute Deviation (MAD)2.72
Skewness5.240521
Sum124641.3
Variance94.816534
MonotonicityNot monotonic
2023-08-30T14:53:21.598413image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8.5 240
 
1.3%
4.5 239
 
1.3%
6.5 237
 
1.3%
5.5 221
 
1.2%
7.5 210
 
1.2%
6 199
 
1.1%
5 191
 
1.1%
9.5 178
 
1.0%
4 176
 
1.0%
8 168
 
0.9%
Other values (1927) 10594
58.7%
(Missing) 5409
29.9%
ValueCountFrequency (%)
0.02 1
 
< 0.1%
0.03 7
 
< 0.1%
0.04 1
 
< 0.1%
0.05 2
 
< 0.1%
0.08 1
 
< 0.1%
0.1 22
0.1%
0.13 1
 
< 0.1%
0.15 6
 
< 0.1%
0.18 1
 
< 0.1%
0.19 1
 
< 0.1%
ValueCountFrequency (%)
140 1
< 0.1%
138.7 1
< 0.1%
136 1
< 0.1%
131.4 1
< 0.1%
127 1
< 0.1%
126.6 1
< 0.1%
124.8 1
< 0.1%
124.6 1
< 0.1%
122 1
< 0.1%
121.4 1
< 0.1%

Sex
Categorical

HIGH CORRELATION  IMBALANCE  MISSING 

Distinct4
Distinct (%)< 0.1%
Missing4330
Missing (%)24.0%
Memory size141.2 KiB
Unknown
13578 
Female
 
113
Male
 
39
Not_Recorded
 
2

Length

Max length12
Median length7
Mean length6.983979
Min length4

Characters and Unicode

Total characters95904
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnknown
2nd rowUnknown
3rd rowUnknown
4th rowUnknown
5th rowUnknown

Common Values

ValueCountFrequency (%)
Unknown 13578
75.2%
Female 113
 
0.6%
Male 39
 
0.2%
Not_Recorded 2
 
< 0.1%
(Missing) 4330
 
24.0%

Length

2023-08-30T14:53:21.677057image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-08-30T14:53:21.733411image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
unknown 13578
98.9%
female 113
 
0.8%
male 39
 
0.3%
not_recorded 2
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
n 40734
42.5%
o 13582
 
14.2%
U 13578
 
14.2%
k 13578
 
14.2%
w 13578
 
14.2%
e 269
 
0.3%
a 152
 
0.2%
l 152
 
0.2%
m 113
 
0.1%
F 113
 
0.1%
Other values (8) 55
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 82168
85.7%
Uppercase Letter 13734
 
14.3%
Connector Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 40734
49.6%
o 13582
 
16.5%
k 13578
 
16.5%
w 13578
 
16.5%
e 269
 
0.3%
a 152
 
0.2%
l 152
 
0.2%
m 113
 
0.1%
d 4
 
< 0.1%
t 2
 
< 0.1%
Other values (2) 4
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
U 13578
98.9%
F 113
 
0.8%
M 39
 
0.3%
N 2
 
< 0.1%
R 2
 
< 0.1%
Connector Punctuation
ValueCountFrequency (%)
_ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 95902
> 99.9%
Common 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 40734
42.5%
o 13582
 
14.2%
U 13578
 
14.2%
k 13578
 
14.2%
w 13578
 
14.2%
e 269
 
0.3%
a 152
 
0.2%
l 152
 
0.2%
m 113
 
0.1%
F 113
 
0.1%
Other values (7) 53
 
0.1%
Common
ValueCountFrequency (%)
_ 2
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 95904
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 40734
42.5%
o 13582
 
14.2%
U 13578
 
14.2%
k 13578
 
14.2%
w 13578
 
14.2%
e 269
 
0.3%
a 152
 
0.2%
l 152
 
0.2%
m 113
 
0.1%
F 113
 
0.1%
Other values (8) 55
 
0.1%
Distinct16342
Distinct (%)90.7%
Missing52
Missing (%)0.3%
Memory size141.2 KiB
2023-08-30T14:53:21.929258image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length255
Median length199
Mean length83.438423
Min length1

Characters and Unicode

Total characters1502726
Distinct characters97
Distinct categories15 ?
Distinct scripts2 ?
Distinct blocks3 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique15902 ?
Unique (%)88.3%

Sample

1st rowalgae at rear of shell
2nd rowmultiple b's on front flippers& a lot of algae growth on shall - mostly towards rear
3rd rowclean
4th row1 b 3 CS+ calcerous algae at rear end of shell+ 9/10+ 10/11 RM has chips+ 9/10 LM has chip+ Left supracaudal is broken a bit at the end+ RF flipper is 1/2 missing and LF flipper the end is mising+ 'nails' are growing at the ends. Ends of RR and LR flip a
5th rowvery lively+ right eye is hanging out + swollen+ left eye is closed + bleeding-possible from a speargun or infection or virus+ hump in 2 LLS + 2/3 CS
ValueCountFrequency (%)
on 26805
 
10.2%
algae 12319
 
4.7%
the 11898
 
4.5%
carapace 10133
 
3.9%
and 9674
 
3.7%
green 7489
 
2.9%
of 7423
 
2.8%
plastron 6309
 
2.4%
small 5803
 
2.2%
barnacles 5686
 
2.2%
Other values (7020) 158366
60.5%
2023-08-30T14:53:22.311403image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
240435
16.0%
a 144069
 
9.6%
e 129667
 
8.6%
n 102130
 
6.8%
l 78779
 
5.2%
r 78344
 
5.2%
o 76245
 
5.1%
t 70943
 
4.7%
s 66077
 
4.4%
c 61239
 
4.1%
Other values (87) 454798
30.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1112824
74.1%
Space Separator 240435
 
16.0%
Uppercase Letter 77609
 
5.2%
Other Punctuation 38056
 
2.5%
Decimal Number 20351
 
1.4%
Control 5570
 
0.4%
Dash Punctuation 2918
 
0.2%
Math Symbol 2321
 
0.2%
Open Punctuation 1328
 
0.1%
Close Punctuation 1306
 
0.1%
Other values (5) 8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 144069
12.9%
e 129667
11.7%
n 102130
 
9.2%
l 78779
 
7.1%
r 78344
 
7.0%
o 76245
 
6.9%
t 70943
 
6.4%
s 66077
 
5.9%
c 61239
 
5.5%
h 49715
 
4.5%
Other values (16) 255616
23.0%
Uppercase Letter
ValueCountFrequency (%)
S 13766
17.7%
F 11035
14.2%
L 11003
14.2%
R 9704
12.5%
C 5018
 
6.5%
M 4908
 
6.3%
T 4145
 
5.3%
G 3493
 
4.5%
N 3214
 
4.1%
B 3208
 
4.1%
Other values (16) 8115
10.5%
Other Punctuation
ValueCountFrequency (%)
. 24839
65.3%
& 7773
 
20.4%
, 3791
 
10.0%
/ 767
 
2.0%
' 643
 
1.7%
" 104
 
0.3%
; 47
 
0.1%
: 33
 
0.1%
% 29
 
0.1%
? 11
 
< 0.1%
Other values (5) 19
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 6973
34.3%
2 2723
 
13.4%
5 1894
 
9.3%
3 1818
 
8.9%
0 1586
 
7.8%
4 1511
 
7.4%
9 1244
 
6.1%
8 1046
 
5.1%
7 817
 
4.0%
6 739
 
3.6%
Math Symbol
ValueCountFrequency (%)
+ 2213
95.3%
= 54
 
2.3%
> 46
 
2.0%
| 6
 
0.3%
~ 1
 
< 0.1%
< 1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 1296
99.2%
] 9
 
0.7%
} 1
 
0.1%
Open Punctuation
ValueCountFrequency (%)
( 1312
98.8%
[ 16
 
1.2%
Modifier Symbol
ValueCountFrequency (%)
` 2
66.7%
^ 1
33.3%
Space Separator
ValueCountFrequency (%)
240435
100.0%
Control
ValueCountFrequency (%)
5570
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2918
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 2
100.0%
Other Symbol
ValueCountFrequency (%)
® 1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%
Final Punctuation
ValueCountFrequency (%)
’ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1190433
79.2%
Common 312293
 
20.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 144069
12.1%
e 129667
 
10.9%
n 102130
 
8.6%
l 78779
 
6.6%
r 78344
 
6.6%
o 76245
 
6.4%
t 70943
 
6.0%
s 66077
 
5.6%
c 61239
 
5.1%
h 49715
 
4.2%
Other values (42) 333225
28.0%
Common
ValueCountFrequency (%)
240435
77.0%
. 24839
 
8.0%
& 7773
 
2.5%
1 6973
 
2.2%
5570
 
1.8%
, 3791
 
1.2%
- 2918
 
0.9%
2 2723
 
0.9%
+ 2213
 
0.7%
5 1894
 
0.6%
Other values (35) 13164
 
4.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1502724
> 99.9%
None 1
 
< 0.1%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
240435
16.0%
a 144069
 
9.6%
e 129667
 
8.6%
n 102130
 
6.8%
l 78779
 
5.2%
r 78344
 
5.2%
o 76245
 
5.1%
t 70943
 
4.7%
s 66077
 
4.4%
c 61239
 
4.1%
Other values (85) 454796
30.3%
None
ValueCountFrequency (%)
® 1
100.0%
Punctuation
ValueCountFrequency (%)
’ 1
100.0%

Status
Text

MISSING 

Distinct439
Distinct (%)3.0%
Missing3633
Missing (%)20.1%
Memory size141.2 KiB
2023-08-30T14:53:22.589417image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length250
Median length8
Mean length9.1355603
Min length1

Characters and Unicode

Total characters131817
Distinct characters71
Distinct categories11 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique401 ?
Unique (%)2.8%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased
ValueCountFrequency (%)
released 13610
76.7%
on 312
 
1.8%
admitted 292
 
1.6%
small 99
 
0.6%
algae 96
 
0.5%
and 91
 
0.5%
of 91
 
0.5%
the 81
 
0.5%
73
 
0.4%
b 71
 
0.4%
Other values (593) 2937
 
16.5%
2023-08-30T14:53:22.898867image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 42654
32.4%
a 15101
 
11.5%
s 14679
 
11.1%
l 14664
 
11.1%
d 14662
 
11.1%
R 13851
 
10.5%
3862
 
2.9%
t 1478
 
1.1%
n 1199
 
0.9%
o 1014
 
0.8%
Other values (61) 8653
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 111619
84.7%
Uppercase Letter 15194
 
11.5%
Space Separator 3862
 
2.9%
Decimal Number 546
 
0.4%
Other Punctuation 382
 
0.3%
Math Symbol 92
 
0.1%
Dash Punctuation 43
 
< 0.1%
Connector Punctuation 28
 
< 0.1%
Open Punctuation 25
 
< 0.1%
Close Punctuation 25
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 42654
38.2%
a 15101
 
13.5%
s 14679
 
13.2%
l 14664
 
13.1%
d 14662
 
13.1%
t 1478
 
1.3%
n 1199
 
1.1%
o 1014
 
0.9%
i 1010
 
0.9%
r 933
 
0.8%
Other values (16) 4225
 
3.8%
Uppercase Letter
ValueCountFrequency (%)
R 13851
91.2%
S 313
 
2.1%
L 300
 
2.0%
A 296
 
1.9%
M 125
 
0.8%
F 125
 
0.8%
C 84
 
0.6%
B 45
 
0.3%
N 29
 
0.2%
T 6
 
< 0.1%
Other values (9) 20
 
0.1%
Decimal Number
ValueCountFrequency (%)
1 187
34.2%
2 82
15.0%
3 74
 
13.6%
4 53
 
9.7%
5 40
 
7.3%
0 33
 
6.0%
9 28
 
5.1%
8 22
 
4.0%
6 15
 
2.7%
7 12
 
2.2%
Other Punctuation
ValueCountFrequency (%)
& 242
63.4%
. 76
 
19.9%
' 37
 
9.7%
/ 21
 
5.5%
? 3
 
0.8%
: 2
 
0.5%
… 1
 
0.3%
Math Symbol
ValueCountFrequency (%)
+ 89
96.7%
> 3
 
3.3%
Close Punctuation
ValueCountFrequency (%)
) 24
96.0%
] 1
 
4.0%
Space Separator
ValueCountFrequency (%)
3862
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 43
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 28
100.0%
Open Punctuation
ValueCountFrequency (%)
( 25
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 126813
96.2%
Common 5004
 
3.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 42654
33.6%
a 15101
 
11.9%
s 14679
 
11.6%
l 14664
 
11.6%
d 14662
 
11.6%
R 13851
 
10.9%
t 1478
 
1.2%
n 1199
 
0.9%
o 1014
 
0.8%
i 1010
 
0.8%
Other values (35) 6501
 
5.1%
Common
ValueCountFrequency (%)
3862
77.2%
& 242
 
4.8%
1 187
 
3.7%
+ 89
 
1.8%
2 82
 
1.6%
. 76
 
1.5%
3 74
 
1.5%
4 53
 
1.1%
- 43
 
0.9%
5 40
 
0.8%
Other values (16) 256
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 131816
> 99.9%
Punctuation 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 42654
32.4%
a 15101
 
11.5%
s 14679
 
11.1%
l 14664
 
11.1%
d 14662
 
11.1%
R 13851
 
10.5%
3862
 
2.9%
t 1478
 
1.1%
n 1199
 
0.9%
o 1014
 
0.8%
Other values (60) 8652
 
6.6%
Punctuation
ValueCountFrequency (%)
… 1
100.0%
Distinct271
Distinct (%)1.5%
Missing75
Missing (%)0.4%
Memory size141.2 KiB
2023-08-30T14:53:23.023377image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length168
Median length14
Mean length14.089731
Min length5

Characters and Unicode

Total characters253432
Distinct characters58
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique198 ?
Unique (%)1.1%

Sample

1st rowReleaseSite_50
2nd rowReleaseSite_62
3rd rowReleaseSite_50
4th rowReleaseSite_50
5th rowReleaseSite_62
ValueCountFrequency (%)
releasesite_62 9951
52.3%
releasesite_11 2091
 
11.0%
releasesite_18 1603
 
8.4%
releasesite_68 1596
 
8.4%
releasesite_50 566
 
3.0%
releasesite_19 433
 
2.3%
releasesite_0 273
 
1.4%
releasesite_8 233
 
1.2%
released 214
 
1.1%
on 113
 
0.6%
Other values (377) 1936
 
10.2%
2023-08-30T14:53:23.222293image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 71370
28.2%
a 18204
 
7.2%
s 18121
 
7.2%
l 18084
 
7.1%
t 17871
 
7.1%
R 17844
 
7.0%
i 17807
 
7.0%
S 17664
 
7.0%
_ 17563
 
6.9%
6 11733
 
4.6%
Other values (48) 27171
 
10.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 164084
64.7%
Uppercase Letter 35740
 
14.1%
Decimal Number 34688
 
13.7%
Connector Punctuation 17563
 
6.9%
Space Separator 1238
 
0.5%
Other Punctuation 96
 
< 0.1%
Math Symbol 8
 
< 0.1%
Dash Punctuation 6
 
< 0.1%
Open Punctuation 5
 
< 0.1%
Close Punctuation 4
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 71370
43.5%
a 18204
 
11.1%
s 18121
 
11.0%
l 18084
 
11.0%
t 17871
 
10.9%
i 17807
 
10.9%
n 429
 
0.3%
d 368
 
0.2%
o 340
 
0.2%
r 262
 
0.2%
Other values (15) 1228
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
R 17844
49.9%
S 17664
49.4%
L 90
 
0.3%
F 49
 
0.1%
M 42
 
0.1%
C 30
 
0.1%
B 16
 
< 0.1%
A 2
 
< 0.1%
V 1
 
< 0.1%
D 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
6 11733
33.8%
2 10119
29.2%
1 6415
18.5%
8 3480
 
10.0%
0 905
 
2.6%
5 842
 
2.4%
9 468
 
1.3%
7 398
 
1.1%
3 237
 
0.7%
4 91
 
0.3%
Other Punctuation
ValueCountFrequency (%)
& 69
71.9%
. 11
 
11.5%
' 8
 
8.3%
/ 6
 
6.2%
? 2
 
2.1%
Math Symbol
ValueCountFrequency (%)
+ 7
87.5%
> 1
 
12.5%
Connector Punctuation
ValueCountFrequency (%)
_ 17563
100.0%
Space Separator
ValueCountFrequency (%)
1238
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%
Open Punctuation
ValueCountFrequency (%)
( 5
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 199824
78.8%
Common 53608
 
21.2%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 71370
35.7%
a 18204
 
9.1%
s 18121
 
9.1%
l 18084
 
9.0%
t 17871
 
8.9%
R 17844
 
8.9%
i 17807
 
8.9%
S 17664
 
8.8%
n 429
 
0.2%
d 368
 
0.2%
Other values (26) 2062
 
1.0%
Common
ValueCountFrequency (%)
_ 17563
32.8%
6 11733
21.9%
2 10119
18.9%
1 6415
 
12.0%
8 3480
 
6.5%
1238
 
2.3%
0 905
 
1.7%
5 842
 
1.6%
9 468
 
0.9%
7 398
 
0.7%
Other values (12) 447
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 253432
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 71370
28.2%
a 18204
 
7.2%
s 18121
 
7.2%
l 18084
 
7.1%
t 17871
 
7.1%
R 17844
 
7.0%
i 17807
 
7.0%
S 17664
 
7.0%
_ 17563
 
6.9%
6 11733
 
4.6%
Other values (48) 27171
 
10.7%

Date_TimeRelease
Text

MISSING 

Distinct3008
Distinct (%)25.2%
Missing6108
Missing (%)33.8%
Memory size141.2 KiB
2023-08-30T14:53:23.449423image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length143
Median length8
Mean length8.3081814
Min length1

Characters and Unicode

Total characters99316
Distinct characters55
Distinct categories10 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique791 ?
Unique (%)6.6%

Sample

1st row22/12/00
2nd row28/10/01
3rd row01/11/01
4th row11/03/02
5th row08/08/02
ValueCountFrequency (%)
releasesite_62 147
 
1.2%
released 88
 
0.7%
on 48
 
0.4%
releasesite_50 45
 
0.4%
releasesite_19 27
 
0.2%
04/11/11 24
 
0.2%
03/11/11 21
 
0.2%
05/10/11 20
 
0.2%
releasesite_18 20
 
0.2%
18/11/17 19
 
0.2%
Other values (3094) 11900
96.3%
2023-08-30T14:53:23.758244image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 23423
23.6%
/ 22984
23.1%
0 13902
14.0%
2 9027
 
9.1%
3 3896
 
3.9%
8 3618
 
3.6%
6 3444
 
3.5%
4 3440
 
3.5%
5 3363
 
3.4%
7 3281
 
3.3%
Other values (45) 8938
 
9.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 69554
70.0%
Other Punctuation 23000
 
23.2%
Lowercase Letter 5153
 
5.2%
Uppercase Letter 802
 
0.8%
Space Separator 496
 
0.5%
Connector Punctuation 290
 
0.3%
Math Symbol 10
 
< 0.1%
Open Punctuation 4
 
< 0.1%
Close Punctuation 4
 
< 0.1%
Dash Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1628
31.6%
a 573
 
11.1%
s 529
 
10.3%
l 511
 
9.9%
t 420
 
8.2%
i 398
 
7.7%
n 178
 
3.5%
d 147
 
2.9%
o 139
 
2.7%
r 116
 
2.3%
Other values (13) 514
 
10.0%
Decimal Number
ValueCountFrequency (%)
1 23423
33.7%
0 13902
20.0%
2 9027
 
13.0%
3 3896
 
5.6%
8 3618
 
5.2%
6 3444
 
5.0%
4 3440
 
4.9%
5 3363
 
4.8%
7 3281
 
4.7%
9 2160
 
3.1%
Uppercase Letter
ValueCountFrequency (%)
R 402
50.1%
S 316
39.4%
F 30
 
3.7%
L 25
 
3.1%
C 12
 
1.5%
M 10
 
1.2%
B 4
 
0.5%
E 2
 
0.2%
P 1
 
0.1%
Other Punctuation
ValueCountFrequency (%)
/ 22984
99.9%
& 11
 
< 0.1%
. 3
 
< 0.1%
' 1
 
< 0.1%
% 1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 8
80.0%
> 1
 
10.0%
= 1
 
10.0%
Space Separator
ValueCountFrequency (%)
496
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 290
100.0%
Open Punctuation
ValueCountFrequency (%)
( 4
100.0%
Close Punctuation
ValueCountFrequency (%)
) 4
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 93361
94.0%
Latin 5955
 
6.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1628
27.3%
a 573
 
9.6%
s 529
 
8.9%
l 511
 
8.6%
t 420
 
7.1%
R 402
 
6.8%
i 398
 
6.7%
S 316
 
5.3%
n 178
 
3.0%
d 147
 
2.5%
Other values (22) 853
14.3%
Common
ValueCountFrequency (%)
1 23423
25.1%
/ 22984
24.6%
0 13902
14.9%
2 9027
 
9.7%
3 3896
 
4.2%
8 3618
 
3.9%
6 3444
 
3.7%
4 3440
 
3.7%
5 3363
 
3.6%
7 3281
 
3.5%
Other values (13) 2983
 
3.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 99316
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 23423
23.6%
/ 22984
23.1%
0 13902
14.0%
2 9027
 
9.1%
3 3896
 
3.9%
8 3618
 
3.6%
6 3444
 
3.5%
4 3440
 
3.5%
5 3363
 
3.4%
7 3281
 
3.3%
Other values (45) 8938
 
9.0%

Interactions

2023-08-30T14:53:15.835457image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:14.865552image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.597722image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.907000image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.044882image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.691380image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.977017image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.512538image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-08-30T14:53:15.764112image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-08-30T14:53:23.837257image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
CCL_cmCCW_cmWeight_KgResearcherCaptureSiteForagingGroundCaptureMethodLandingSiteSpeciesSex
CCL_cm1.0000.9780.9640.0840.1560.1590.1370.1250.2950.492
CCW_cm0.9781.0000.9550.0820.1540.1520.1370.1200.3030.517
Weight_Kg0.9640.9551.0000.0000.0990.1020.1610.0660.2940.527
Researcher0.0840.0820.0001.0000.0790.0700.1420.0930.1270.000
CaptureSite0.1560.1540.0990.0791.0000.6920.2040.9990.2600.113
ForagingGround0.1590.1520.1020.0700.6921.0000.1590.6920.1980.084
CaptureMethod0.1370.1370.1610.1420.2040.1591.0000.2110.1050.131
LandingSite0.1250.1200.0660.0930.9990.6920.2111.0000.1550.071
Species0.2950.3030.2940.1270.2600.1980.1050.1551.0000.321
Sex0.4920.5170.5270.0000.1130.0840.1310.0710.3211.000

Missing values

2023-08-30T14:53:16.117774image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-08-30T14:53:16.346645image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-08-30T14:53:16.545227image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Rescue_IDDate_TimeCaughtResearcherCaptureSiteForagingGroundCaptureMethodFisherLandingSiteSpeciesTag_1Tag_2Lost_TagsT_NumberCCL_cmCCW_cmWeight_KgSexTurtleCharacteristicsStatusReleaseSiteDate_TimeRelease
02000_RE_00602000-12-22Researcher_25CaptureSite_0OceanNetFisher_1072LandingSite_CaptureSiteCategory_2Species_6CC00147NaNNaNNaN64.7062.60NaNUnknownalgae at rear of shellReleasedReleaseSite_5022/12/00
12001_RE_01872001-10-28Researcher_6CaptureSite_0OceanNetFisher_520LandingSite_CaptureSiteCategory_2Species_6W442NaNNaNNaN35.8531.35NaNUnknownmultiple b's on front flippers& a lot of algae growth on shall - mostly towards rearReleasedReleaseSite_6228/10/01
22001_RE_01972001-11-01Researcher_6CaptureSite_0OceanNetFisher_1669LandingSite_CaptureSiteCategory_2Species_5KE0376NaNNaNNaN51.8049.20NaNUnknowncleanReleasedReleaseSite_5001/11/01
32002_RE_00312002-03-11Researcher_32CaptureSite_0OceanNetFisher_1798LandingSite_CaptureSiteCategory_2Species_6CC00302NaNNaNNaN60.5059.00NaNUnknown1 b 3 CS+ calcerous algae at rear end of shell+ 9/10+ 10/11 RM has chips+ 9/10 LM has chip+ Left supracaudal is broken a bit at the end+ RF flipper is 1/2 missing and LF flipper the end is mising+ 'nails' are growing at the ends. Ends of RR and LR flip aReleasedReleaseSite_5011/03/02
42002_RE_01182002-08-08Researcher_25CaptureSite_0OceanBeachedFisher_1918LandingSite_CaptureSiteCategory_2Species_5NotTagged_0113NaNNaNNaN34.7033.00NaNUnknownvery lively+ right eye is hanging out + swollen+ left eye is closed + bleeding-possible from a speargun or infection or virus+ hump in 2 LLS + 2/3 CSReleasedReleaseSite_6208/08/02
52002_RE_01192002-08-10Researcher_25CaptureSite_0OceanNot_RecordedFisher_1918LandingSite_CaptureSiteCategory_2Species_5NotTagged_0114NaNNaNNaN33.2030.70NaNUnknownlarge chip 6 LMReleasedReleaseSite_810/08/02
62002_RE_02142002-10-21Researcher_25CaptureSite_0OceanNetFisher_2013LandingSite_CaptureSiteCategory_2Species_6KA460NaNNaNNaN37.4033.50NaNUnknowngreen and whitish-pink calcerous algae growth towards rear end of shell+ small mark on top of head+ small scar on nuchal+ small b's on flippers and neck.ReleasedReleaseSite_6221/10/02
72002_RE_02152002-10-21Researcher_25CaptureSite_0OceanNetFisher_1815LandingSite_CaptureSiteCategory_2Species_6KA442NaNNaNNaN40.3037.20NaNUnknowncalcerous algae growth-especially rear end and some red algae+ 1 RMS is split+ 7 RMS has a chip+ 8 LMS has a chip+ 10 RMS has a hole+ 10 LMS has a small chip+ small hole 2 CSReleasedReleaseSite_5021/10/02
82002_RE_02182002-10-22Researcher_30CaptureSite_0OceanNetFisher_1815LandingSite_CaptureSiteCategory_2Species_5KA466NaNNaNNaN44.0043.20NaNUnknownclean shellReleasedReleaseSite_5022/10/02
92003_RE_01872003-06-09Researcher_32CaptureSite_0OceanNetFisher_1066LandingSite_CaptureSiteCategory_2Species_5KE1184NaNNaNNaN48.3043.30NaNUnknown10 LMS has a small chip& right rear flipper is 3/4 missing& 11 RMS has a small chip& pink calcerous algae growth on left supra.ReleasedReleaseSite_62NaN
Rescue_IDDate_TimeCaughtResearcherCaptureSiteForagingGroundCaptureMethodFisherLandingSiteSpeciesTag_1Tag_2Lost_TagsT_NumberCCL_cmCCW_cmWeight_KgSexTurtleCharacteristicsStatusReleaseSiteDate_TimeRelease
180522018_RE_14962018-12-16Researcher_30CaptureSite_9OceanNetFisher_1343LandingSite_CaptureSiteCategory_1Species_6KE8098noneNaNNaN52.9049.4018.70UnknownGreen algae on carapace\nMissing tip of RRF\nShell flaking slightlyReleasedReleaseSite_6816/12/18
180532018_RE_14972018-12-16Researcher_30CaptureSite_9OceanNetFisher_1472LandingSite_CaptureSiteCategory_1Species_6KES1492noneNaNNaN37.0034.035.55UnknownLight pinkish calcareous algae on plastron\nLight green algae on carapaceReleasedReleaseSite_6816/12/18
180542018_RE_15072018-12-17Researcher_30CaptureSite_9OceanNetFisher_1216LandingSite_CaptureSiteCategory_1Species_6KE8681noneNaNNaN46.4742.5712.13Unknownalgae on carapace \n12 LMSReleasedReleaseSite_6817/12/18
180552018_RE_15082018-12-17Researcher_30CaptureSite_9OceanNetFisher_1216LandingSite_CaptureSiteCategory_1Species_6KES1306noneKE6491NaN64.9359.9731.55UnknownGreen algae on carapace \nSmall barnacles on shoulders\nTag lump LRFReleasedReleaseSite_6817/12/18
180562018_RE_15102018-12-18Researcher_30CaptureSite_9OceanNetFisher_1550LandingSite_CaptureSiteCategory_1Species_5KES1796NaNNaNNaN48.0345.8713.07UnknownTip of RFF missing\nBruising on front flippersReleasedReleaseSite_6818/12/18
180572018_RE_15112018-12-18Researcher_30CaptureSite_9OceanNetFisher_569LandingSite_CaptureSiteCategory_1Species_5KES1828NaNNaNNaN57.1350.5721.09UnknownWhite calcareous algae on carapaceReleasedReleaseSite_6818/12/18
180582018_RE_15142018-12-18Researcher_30CaptureSite_9OceanNetFisher_125LandingSite_CaptureSiteCategory_1Species_6KES0563NaNKES0416NaN42.0738.379.02UnknownCalcareous + green algae on carapace\nBarnacles on shouldersReleasedReleaseSite_6818/12/18
180592018_RE_15322018-12-24Researcher_30CaptureSite_9OceanNetFisher_1343LandingSite_CaptureSiteCategory_1Species_5KES1833NaNNaNNaN57.2052.30NaNUnknownClean turtleReleasedReleaseSite_6824/12/18
180602018_RE_15332018-12-24Researcher_30CaptureSite_9OceanNetFisher_1551LandingSite_CaptureSiteCategory_1Species_5KES1831NaNNaNNaN51.9048.50NaNUnknownGreen algae on carapace\ntip of left supra missingReleasedReleaseSite_6824/12/18
180612018_RE_15502018-12-28Researcher_30CaptureSite_9OceanNetFisher_1551LandingSite_CaptureSiteCategory_1Species_6KES1432noneNaNNaN34.6031.204.29UnknownThick pink patches of calcareous algae on carapace. Green algae on shell and plastron.ReleasedReleaseSite_3728/12/18